We collected the survey data of passengers traveling from the Austin-Bergstrom International Airport. The data was obtained from the US Government website: data.gov. The data contained 37 features and 3501 survey responses. We intend to perform key driver analysis and understand which features affect customer’s overall satisfaction and what can the airport do to improve their service to the passengers.
Our data consists of 37 features and looks like this:
survey.df <- read.csv("~/OneDrive - Duke University/Coursework/590.21 Marketing Analytics/Project/Airport_Quarterly_Passenger_Survey.csv")
head(survey.df)
## Quarter Date.recorded Departure.time
## 1 3Q16 09/04/2016 11:45
## 2 2Q16 05/01/2016 16:45
## 3 2Q16 04/07/2016 11:10
## 4 3Q16 09/02/2016 17:16
## 5 3Q16 08/04/2016 7:49
## 6 3Q16 08/02/2016 9:45
## Ground.transportation.to.from.airport Parking.facilities
## 1 0 0
## 2 0 0
## 3 4 4
## 4 0 0
## 5 5 0
## 6 5 5
## Parking.facilities..value.for.money. Availability.of.baggage.carts
## 1 0 0
## 2 0 0
## 3 4 5
## 4 0 0
## 5 0 0
## 6 2 0
## Efficiency.of.check.in.staff Check.in.wait.time
## 1 5 0
## 2 5 0
## 3 5 5
## 4 4 0
## 5 4 4
## 6 4 5
## Courtesy.of.of.check.in.staff Wait.time.at.passport.inspection
## 1 0 3
## 2 0 2
## 3 5 NA
## 4 0 NA
## 5 4 5
## 6 5 0
## Courtesy.of.inspection.staff Courtesy.of.security.staff
## 1 4 4
## 2 3 3
## 3 NA 5
## 4 NA 4
## 5 5 2
## 6 0 5
## Thoroughness.of.security.inspection Wait.time.of.security.inspection
## 1 5 2
## 2 0 2
## 3 5 5
## 4 4 2
## 5 3 2
## 6 5 5
## Feeling.of.safety.and.security
## 1 4
## 2 3
## 3 5
## 4 3
## 5 3
## 6 5
## Ease.of.finding.your.way.through.the.airport Flight.information.screens
## 1 5 5
## 2 5 5
## 3 0 NA
## 4 4 4
## 5 4 3
## 6 5 5
## Walking.distance.inside.terminal Ease.of.making.connections
## 1 5 0
## 2 4 0
## 3 0 0
## 4 4 0
## 5 5 0
## 6 5 0
## Courtesy.of.airport.staff Restaurants Restaurants..value.for.money.
## 1 0 0 0
## 2 0 4 3
## 3 5 5 5
## 4 0 0 2
## 5 4 4 4
## 6 5 5 5
## Availability.of.banks.ATM.money.changing Shopping.facilities
## 1 0 0
## 2 0 0
## 3 0 5
## 4 0 0
## 5 3 4
## 6 0 5
## Shopping.facilities..value.for.money. Internet.access
## 1 0 0
## 2 0 4
## 3 0 0
## 4 0 0
## 5 3 2
## 6 5 5
## Business.executive.lounges Availability.of.washrooms
## 1 0 4
## 2 0 0
## 3 0 5
## 4 0 4
## 5 2 4
## 6 5 5
## Cleanliness.of.washrooms Comfort.of.waiting.gate.areas
## 1 0 4
## 2 0 4
## 3 5 5
## 4 4 4
## 5 4 2
## 6 5 5
## Cleanliness.of.airport.terminal Ambience.of.airport
## 1 5 4
## 2 4 4
## 3 5 5
## 4 4 4
## 5 5 4
## 6 5 5
## Arrivals.passport.and.visa.inspection Speed.of.baggage.delivery
## 1 4 0
## 2 4 0
## 3 NA 0
## 4 4 0
## 5 4 0
## 6 5 0
## Customs.inspection Overall.satisfaction
## 1 0 0
## 2 0 0
## 3 5 0
## 4 0 0
## 5 4 0
## 6 5 0
In the Summary we noticed that the data has multiple NA values and the data needs to be cleaned. The exploratory data analysis also showed us that the date and time format of the airport was changed in 2016 and hence the “Departure.time” feature does not have the same format.
summary(survey.df)
## Quarter Date.recorded Departure.time
## 2Q15 : 352 05/11/2017: 56 8:00 : 43
## 1Q17 : 351 11/04/2016: 56 12:25 PM: 36
## 4Q15 : 351 01/05/2017: 55 18:00 : 30
## 1Q15 : 350 01/10/2017: 55 19:20 : 30
## 1Q16 : 350 05/03/2017: 55 9:25 : 30
## 2Q16 : 350 11/12/2016: 55 14:45 : 29
## (Other):1397 (Other) :3169 (Other) :3303
## Ground.transportation.to.from.airport Parking.facilities
## Min. :0.000 Min. :0.00
## 1st Qu.:0.000 1st Qu.:0.00
## Median :2.000 Median :0.00
## Mean :2.191 Mean :1.13
## 3rd Qu.:4.000 3rd Qu.:3.00
## Max. :5.000 Max. :5.00
## NA's :54 NA's :39
## Parking.facilities..value.for.money. Availability.of.baggage.carts
## Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000
## Median :0.000 Median :0.000
## Mean :1.017 Mean :1.036
## 3rd Qu.:2.000 3rd Qu.:2.000
## Max. :5.000 Max. :5.000
## NA's :46 NA's :91
## Efficiency.of.check.in.staff Check.in.wait.time
## Min. :0.000 Min. :0.000
## 1st Qu.:3.000 1st Qu.:3.000
## Median :5.000 Median :5.000
## Mean :3.778 Mean :3.789
## 3rd Qu.:5.000 3rd Qu.:5.000
## Max. :5.000 Max. :5.000
## NA's :38 NA's :39
## Courtesy.of.of.check.in.staff Wait.time.at.passport.inspection
## Min. :0.000 Min. :0.000
## 1st Qu.:3.000 1st Qu.:2.000
## Median :5.000 Median :4.000
## Mean :3.778 Mean :3.347
## 3rd Qu.:5.000 3rd Qu.:5.000
## Max. :5.000 Max. :5.000
## NA's :52 NA's :69
## Courtesy.of.inspection.staff Courtesy.of.security.staff
## Min. :0.000 Min. :0.000
## 1st Qu.:3.000 1st Qu.:4.000
## Median :4.000 Median :4.000
## Mean :3.456 Mean :3.962
## 3rd Qu.:5.000 3rd Qu.:5.000
## Max. :5.000 Max. :5.000
## NA's :96 NA's :31
## Thoroughness.of.security.inspection Wait.time.of.security.inspection
## Min. :0.000 Min. :0.000
## 1st Qu.:4.000 1st Qu.:3.000
## Median :4.000 Median :4.000
## Mean :4.082 Mean :4.019
## 3rd Qu.:5.000 3rd Qu.:5.000
## Max. :5.000 Max. :5.000
## NA's :46 NA's :50
## Feeling.of.safety.and.security
## Min. :0.000
## 1st Qu.:4.000
## Median :5.000
## Mean :4.192
## 3rd Qu.:5.000
## Max. :5.000
## NA's :43
## Ease.of.finding.your.way.through.the.airport Flight.information.screens
## Min. :0.000 Min. :0.000
## 1st Qu.:4.000 1st Qu.:4.000
## Median :5.000 Median :5.000
## Mean :4.506 Mean :4.229
## 3rd Qu.:5.000 3rd Qu.:5.000
## Max. :5.000 Max. :5.000
## NA's :36 NA's :26
## Walking.distance.inside.terminal Ease.of.making.connections
## Min. :0.000 Min. :0.0000
## 1st Qu.:4.000 1st Qu.:0.0000
## Median :5.000 Median :0.0000
## Mean :4.397 Mean :0.3602
## 3rd Qu.:5.000 3rd Qu.:0.0000
## Max. :5.000 Max. :5.0000
## NA's :37 NA's :83
## Courtesy.of.airport.staff Restaurants Restaurants..value.for.money.
## Min. :0.00 Min. :0.000 Min. :0.000
## 1st Qu.:3.00 1st Qu.:0.000 1st Qu.:0.000
## Median :4.00 Median :4.000 Median :3.000
## Mean :3.59 Mean :2.969 Mean :2.548
## 3rd Qu.:5.00 3rd Qu.:5.000 3rd Qu.:4.000
## Max. :5.00 Max. :5.000 Max. :5.000
## NA's :40 NA's :59 NA's :60
## Availability.of.banks.ATM.money.changing Shopping.facilities
## Min. :0.0000 Min. :0.000
## 1st Qu.:0.0000 1st Qu.:0.000
## Median :0.0000 Median :0.000
## Mean :0.8991 Mean :1.885
## 3rd Qu.:0.0000 3rd Qu.:4.000
## Max. :5.0000 Max. :5.000
## NA's :41 NA's :46
## Shopping.facilities..value.for.money. Internet.access
## Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000
## Median :0.000 Median :1.000
## Mean :1.538 Mean :1.901
## 3rd Qu.:3.000 3rd Qu.:4.000
## Max. :5.000 Max. :5.000
## NA's :57 NA's :73
## Business.executive.lounges Availability.of.washrooms
## Min. :0.0000 Min. :0.000
## 1st Qu.:0.0000 1st Qu.:4.000
## Median :0.0000 Median :4.000
## Mean :0.4842 Mean :3.908
## 3rd Qu.:0.0000 3rd Qu.:5.000
## Max. :5.0000 Max. :5.000
## NA's :91 NA's :35
## Cleanliness.of.washrooms Comfort.of.waiting.gate.areas
## Min. :0.000 Min. :0.000
## 1st Qu.:3.000 1st Qu.:3.000
## Median :4.000 Median :4.000
## Mean :3.801 Mean :4.003
## 3rd Qu.:5.000 3rd Qu.:5.000
## Max. :5.000 Max. :5.000
## NA's :37 NA's :41
## Cleanliness.of.airport.terminal Ambience.of.airport
## Min. :0.000 Min. :0.000
## 1st Qu.:4.000 1st Qu.:4.000
## Median :5.000 Median :4.000
## Mean :4.377 Mean :4.232
## 3rd Qu.:5.000 3rd Qu.:5.000
## Max. :5.000 Max. :5.000
## NA's :32 NA's :54
## Arrivals.passport.and.visa.inspection Speed.of.baggage.delivery
## Min. :0.000 Min. :0.0000
## 1st Qu.:0.000 1st Qu.:0.0000
## Median :4.000 Median :0.0000
## Mean :2.644 Mean :0.9181
## 3rd Qu.:5.000 3rd Qu.:0.0000
## Max. :5.000 Max. :5.0000
## NA's :143 NA's :181
## Customs.inspection Overall.satisfaction
## Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:0.000
## Median :0.000 Median :0.000
## Mean :1.343 Mean :1.826
## 3rd Qu.:3.000 3rd Qu.:4.000
## Max. :5.000 Max. :5.000
## NA's :201 NA's :172
We went ahead and tried two different methods in order to see the difference between the two of them and would it add a bias after imputing values for the NA’s:
We decided to impute data in rows which have less that 6 missing values, if a passenger has not answered more than 6 unanswered we decided to omit that data because then that would not include the true customer data and effect our model. Using this, our data then reduced from 3501 to 3434 survey responses.
length(unique (unlist (lapply (survey.df, function (x) which(is.na(x))))))
## [1] 969
969/3501
## [1] 0.2767781
df <- vector()
a <- vector()
df <- as.integer(apply(survey.df, 1, function(x) sum(is.na(x))))
for(i in 1:37){
r <- sum(df==i)
a[i] <- r
}
print(a)
## [1] 509 162 140 64 27 14 12 9 9 7 8 0 3 3 0 0 1
## [18] 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [35] 0 0 0
length(unique (unlist (lapply (survey.df, function (x) which(is.na(x))))))
## [1] 969
We have 969 rows with NA values
survey.df$missing <- as.integer(apply(survey.df, 1, function(x) sum(is.na(x))))
survey_temp = survey.df[!survey.df$missing > 5, 1:37]
dim(survey_temp)
## [1] 3434 37
library(VIM)
## Loading required package: colorspace
## Loading required package: grid
## Loading required package: data.table
## VIM is ready to use.
## Since version 4.0.0 the GUI is in its own package VIMGUI.
##
## Please use the package to use the new (and old) GUI.
## Suggestions and bug-reports can be submitted at: https://github.com/alexkowa/VIM/issues
##
## Attaching package: 'VIM'
## The following object is masked from 'package:datasets':
##
## sleep
impu <- kNN(survey_temp, variable = colnames(survey_temp), metric = NULL, k = 5)
dim(survey_imputed)
## [1] 3434 37
This is the dimensions of the dataset after the imputation. We deleted 67 rows and imputed the rest.
After omiting the data the dimensions of the dataset are mentioned below.
survey_omit <- na.omit(survey.df)
survey_omit$missing <- NULL
dim(survey_omit)
## [1] 2532 37
We have used both the datasets to run various tests to conduct key driver analysis.
levels(survey.df$Departure.time)
## [1] "1:00 PM" "1:05 PM" "1:09 PM" "1:10 PM" "1:15 PM" "1:18 PM"
## [7] "1:20 AM" "1:35 PM" "1:42 PM" "1:45 PM" "1:49 PM" "1:50 PM"
## [13] "1:53 PM" "10:00" "10:00 AM" "10:01 AM" "10:04" "10:05"
## [19] "10:05 AM" "10:09" "10:10" "10:10 AM" "10:15" "10:15 AM"
## [25] "10:20" "10:20 AM" "10:25" "10:30" "10:30 AM" "10:34 AM"
## [31] "10:35" "10:35 AM" "10:38" "10:40" "10:40 AM" "10:42"
## [37] "10:43 AM" "10:44 AM" "10:45" "10:45 AM" "10:48" "10:50"
## [43] "10:50 AM" "10:54" "10:55" "10:55 AM" "10:59" "11:00"
## [49] "11:00 AM" "11:02" "11:05" "11:05 AM" "11:10" "11:10 AM"
## [55] "11:13 AM" "11:15" "11:15 AM" "11:20" "11:21" "11:25"
## [61] "11:25 AM" "11:30 AM" "11:34" "11:35" "11:35 AM" "11:40"
## [67] "11:40 AM" "11:41 AM" "11:44 AM" "11:45" "11:45 AM" "11:46"
## [73] "11:47" "11:50" "11:52" "11:53" "11:55" "11:55 AM"
## [79] "11:56 AM" "11:59" "12:00" "12:00 PM" "12:01 PM" "12:05"
## [85] "12:05 PM" "12:07" "12:10 PM" "12:12 PM" "12:14" "12:15"
## [91] "12:15 PM" "12:20" "12:20 PM" "12:24" "12:25 PM" "12:27"
## [97] "12:35" "12:35 PM" "12:39 PM" "12:40" "12:40 PM" "12:42 PM"
## [103] "12:45" "12:45 PM" "12:48 PM" "12:50" "12:50 PM" "12:52 PM"
## [109] "12:54" "12:55" "12:55 PM" "12:56" "12:57" "13:00"
## [115] "13:05" "13:06" "13:10" "13:15" "13:20" "13:25"
## [121] "13:30" "13:32" "13:35" "13:37" "13:40" "13:45"
## [127] "13:46" "13:50" "14:00" "14:05" "14:10" "14:11"
## [133] "14:13" "14:15" "14:18" "14:23" "14:25" "14:26"
## [139] "14:31" "14:35" "14:40" "14:45" "14:47" "14:50"
## [145] "14:55" "14:59" "15:05" "15:10" "15:15" "15:20"
## [151] "15:22" "15:25" "15:29" "15:30" "15:32" "15:33"
## [157] "15:35" "15:50" "15:51" "16:00" "16:05" "16:08"
## [163] "16:10" "16:15" "16:22" "16:24" "16:25" "16:28"
## [169] "16:29" "16:30" "16:35" "16:40" "16:41" "16:45"
## [175] "16:49" "16:53" "16:55" "17:00" "17:03" "17:04"
## [181] "17:05" "17:10" "17:15" "17:16" "17:20" "17:23"
## [187] "17:25" "17:26" "17:30" "17:35" "17:40" "17:43"
## [193] "17:45" "17:55" "17:59" "18:00" "18:05" "18:10"
## [199] "18:15" "18:20" "18:25" "18:26" "18:30" "18:35"
## [205] "18:40" "18:45" "18:47" "18:50" "18:55" "18:56"
## [211] "19:00" "19:11" "19:15" "19:16" "19:20" "19:23"
## [217] "19:30" "19:35" "19:39" "19:40" "19:42" "19:45"
## [223] "19:50" "19:55" "2:07 PM" "2:15 PM" "2:20 PM" "2:40 PM"
## [229] "2:45 PM" "2:55 PM" "20:00" "20:05" "20:06" "20:10"
## [235] "20:35" "20:50" "21:05" "21:35" "21:51" "3:00 PM"
## [241] "3:05 PM" "3:12 PM" "3:15 PM" "3:30 PM" "3:35 PM" "3:50 PM"
## [247] "4:05 PM" "4:35 PM" "4:40 PM" "5:05 PM" "5:10 PM" "5:20 PM"
## [253] "5:25 PM" "5:35 PM" "5:55 PM" "6:10" "6:20 PM" "6:25 AM"
## [259] "6:30" "6:35 AM" "6:35 PM" "6:36" "6:38" "6:40 PM"
## [265] "6:45 PM" "6:50" "6:50 AM" "6:50 PM" "6:54" "6:55 PM"
## [271] "6:57 PM" "7:00" "7:00 PM" "7:15" "7:20" "7:20 PM"
## [277] "7:29" "7:30" "7:30 PM" "7:35 PM" "7:40" "7:45"
## [283] "7:45 AM" "7:49" "7:50 AM" "8:00" "8:05" "8:05 AM"
## [289] "8:10" "8:10 AM" "8:10 PM" "8:15" "8:25" "8:26"
## [295] "8:30" "8:34 AM" "8:35" "8:35 AM" "8:39" "8:40"
## [301] "8:40 PM" "8:45" "8:45 AM" "8:45 PM" "8:46 PM" "8:50"
## [307] "8:50 AM" "8:53" "8:54" "8:55" "8:55 PM" "9:04"
## [313] "9:05" "9:10" "9:10 AM" "9:16" "9:20" "9:20 AM"
## [319] "9:25" "9:25 PM" "9:30" "9:30 AM" "9:35" "9:40"
## [325] "9:45" "9:50" "9:51 AM" "9:51 PM" "9:55" "9:55 PM"
## [331] "9:57 PM"
The second step in cleaning data was to format the Departure.time feature. Since there are two kinds of formats in the survey data (both 12 hour and 24 hour) we decided to create bins of the time of the day using regular expressions.
We binned the “Departure.time” into Early Morning, Morning, Day, Evening and Night. + 01.00 - 07.59 is Early Morning + 08.00 - 11.59 is Morning + 12.00 - 16.59 is Day + 17.00 - 19.59 is Evening + 20.00 - 00.59 is Night
library(stringr)
survey_imputed$Departure.time <- as.character(survey_imputed$Departure.time)
for (i in 1:nrow(survey_imputed)) {
if (str_detect(survey_imputed$Departure.time[i], regex(".am", ignore_case = TRUE)))
{
if(str_detect(survey_imputed$Departure.time[i], regex("^12:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Night"}
else if(str_detect(survey_imputed$Departure.time[i], regex("^1:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Early Morning"}
else if(str_detect(survey_imputed$Departure.time[i], regex("^2:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Early Morning"}
else if(str_detect(survey_imputed$Departure.time[i], regex("^3:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Early Morning"}
else if(str_detect(survey_imputed$Departure.time[i], regex("^4:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Early Morning"}
else if(str_detect(survey_imputed$Departure.time[i], regex("^5:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Early Morning"}
else if(str_detect(survey_imputed$Departure.time[i], regex("^6:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Early Morning"}
else if(str_detect(survey_imputed$Departure.time[i], regex("^7:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Early Morning"}
else if(str_detect(survey_imputed$Departure.time[i], regex("^8:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Morning"}
else if(str_detect(survey_imputed$Departure.time[i], regex("^9:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Morning"}
else if(str_detect(survey_imputed$Departure.time[i], regex("^10:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Morning"}
else if(str_detect(survey_imputed$Departure.time[i], regex("^11:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Morning"}
else {survey_imputed$Departure.time[i] <- NA}
} else if (str_detect(survey_imputed$Departure.time[i], regex(".pm", ignore_case = TRUE)))
{
if(str_detect(survey_imputed$Departure.time[i], regex("^12:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Day"}
else if(str_detect(survey_imputed$Departure.time[i], regex("^1:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Day"}
else if(str_detect(survey_imputed$Departure.time[i], regex("^2:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Day"}
else if(str_detect(survey_imputed$Departure.time[i], regex("^3:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Day"}
else if(str_detect(survey_imputed$Departure.time[i], regex("^4:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Day"}
else if(str_detect(survey_imputed$Departure.time[i], regex("^5:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Evening"}
else if(str_detect(survey_imputed$Departure.time[i], regex("^6:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Evening"}
else if(str_detect(survey_imputed$Departure.time[i], regex("^7:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Evening"}
else if(str_detect(survey_imputed$Departure.time[i], regex("^8:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Night"}
else if(str_detect(survey_imputed$Departure.time[i], regex("^9:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Night"}
else if(str_detect(survey_imputed$Departure.time[i], regex("^10:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Night"}
else if(str_detect(survey_imputed$Departure.time[i], regex("^11:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Night"}
} else if (str_detect(survey_imputed$Departure.time[i], regex("^00:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Night"
} else if (str_detect(survey_imputed$Departure.time[i], regex("^1:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Early Morning"
} else if (str_detect(survey_imputed$Departure.time[i], regex("^2:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Early Morning"
} else if (str_detect(survey_imputed$Departure.time[i], regex("^3:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Early Morning"
} else if (str_detect(survey_imputed$Departure.time[i], regex("^4:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Early Morning"
}else if (str_detect(survey_imputed$Departure.time[i], regex("^5:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Early Morning"
}else if (str_detect(survey_imputed$Departure.time[i], regex("^6:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Early Morning"
}else if (str_detect(survey_imputed$Departure.time[i], regex("^7:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Early Morning"
}else if (str_detect(survey_imputed$Departure.time[i], regex("^8:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Morning"
}else if (str_detect(survey_imputed$Departure.time[i], regex("^9:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Morning"
}else if (str_detect(survey_imputed$Departure.time[i], regex("^10:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Morning"
}else if (str_detect(survey_imputed$Departure.time[i], regex("^11:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Morning"
}else if (str_detect(survey_imputed$Departure.time[i], regex("^12:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Day"
}else if (str_detect(survey_imputed$Departure.time[i], regex("^13:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Day"
}else if (str_detect(survey_imputed$Departure.time[i], regex("^14:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Day"
}else if (str_detect(survey_imputed$Departure.time[i], regex("^15:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Day"
}else if (str_detect(survey_imputed$Departure.time[i], regex("^16:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Day"
}else if (str_detect(survey_imputed$Departure.time[i], regex("^17:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Evening"
}else if (str_detect(survey_imputed$Departure.time[i], regex("^18:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Evening"
}else if (str_detect(survey_imputed$Departure.time[i], regex("^19:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Evening"
}else if (str_detect(survey_imputed$Departure.time[i], regex("^20:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Night"
}else if (str_detect(survey_imputed$Departure.time[i], regex("^21:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Night"
}else if (str_detect(survey_imputed$Departure.time[i], regex("^22:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Night"
}else if (str_detect(survey_imputed$Departure.time[i], regex("^23:", ignore_case = TRUE))) {survey_imputed$Departure.time[i] <- "Night"
}else {survey_imputed$Departure.time[i] <- NA}
}
survey_omit$Departure.time.char <- as.character(survey_omit$Departure.time)
survey_omit$Departure.time.bin <- survey_omit$Departure.time.char
for (i in 1:nrow(survey_omit)) {
if (str_detect(survey_omit$Departure.time.char[i], regex(".am", ignore_case = TRUE)))
{
if(str_detect(survey_omit$Departure.time.char[i], regex("^12:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Night"}
else if(str_detect(survey_omit$Departure.time.char[i], regex("^1:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Early Morning"}
else if(str_detect(survey_omit$Departure.time.char[i], regex("^2:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Early Morning"}
else if(str_detect(survey_omit$Departure.time.char[i], regex("^3:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Early Morning"}
else if(str_detect(survey_omit$Departure.time.char[i], regex("^4:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Early Morning"}
else if(str_detect(survey_omit$Departure.time.char[i], regex("^5:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Early Morning"}
else if(str_detect(survey_omit$Departure.time.char[i], regex("^6:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Early Morning"}
else if(str_detect(survey_omit$Departure.time.char[i], regex("^7:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Early Morning"}
else if(str_detect(survey_omit$Departure.time.char[i], regex("^8:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Morning"}
else if(str_detect(survey_omit$Departure.time.char[i], regex("^9:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Morning"}
else if(str_detect(survey_omit$Departure.time.char[i], regex("^10:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Morning"}
else if(str_detect(survey_omit$Departure.time.char[i], regex("^11:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Morning"}
else {survey_omit$Departure.time.bin[i] <- NA}
} else if (str_detect(survey_omit$Departure.time.char[i], regex(".pm", ignore_case = TRUE)))
{
if(str_detect(survey_omit$Departure.time.char[i], regex("^12:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Day"}
else if(str_detect(survey_omit$Departure.time.char[i], regex("^1:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Day"}
else if(str_detect(survey_omit$Departure.time.char[i], regex("^2:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Day"}
else if(str_detect(survey_omit$Departure.time.char[i], regex("^3:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Day"}
else if(str_detect(survey_omit$Departure.time.char[i], regex("^4:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Day"}
else if(str_detect(survey_omit$Departure.time.char[i], regex("^5:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Evening"}
else if(str_detect(survey_omit$Departure.time.char[i], regex("^6:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Evening"}
else if(str_detect(survey_omit$Departure.time.char[i], regex("^7:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Evening"}
else if(str_detect(survey_omit$Departure.time.char[i], regex("^8:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Night"}
else if(str_detect(survey_omit$Departure.time.char[i], regex("^9:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Night"}
else if(str_detect(survey_omit$Departure.time.char[i], regex("^10:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Night"}
else if(str_detect(survey_omit$Departure.time.char[i], regex("^11:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Night"}
} else if (str_detect(survey_omit$Departure.time.char[i], regex("^00:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Night"
} else if (str_detect(survey_omit$Departure.time.char[i], regex("^1:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Early Morning"
} else if (str_detect(survey_omit$Departure.time.char[i], regex("^2:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Early Morning"
} else if (str_detect(survey_omit$Departure.time.char[i], regex("^3:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Early Morning"
} else if (str_detect(survey_omit$Departure.time.char[i], regex("^4:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Early Morning"
}else if (str_detect(survey_omit$Departure.time.char[i], regex("^5:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Early Morning"
}else if (str_detect(survey_omit$Departure.time.char[i], regex("^6:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Early Morning"
}else if (str_detect(survey_omit$Departure.time.char[i], regex("^7:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Early Morning"
}else if (str_detect(survey_omit$Departure.time.char[i], regex("^8:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Morning"
}else if (str_detect(survey_omit$Departure.time.char[i], regex("^9:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Morning"
}else if (str_detect(survey_omit$Departure.time.char[i], regex("^10:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Morning"
}else if (str_detect(survey_omit$Departure.time.char[i], regex("^11:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Morning"
}else if (str_detect(survey_omit$Departure.time.char[i], regex("^12:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Day"
}else if (str_detect(survey_omit$Departure.time.char[i], regex("^13:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Day"
}else if (str_detect(survey_omit$Departure.time.char[i], regex("^14:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Day"
}else if (str_detect(survey_omit$Departure.time.char[i], regex("^15:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Day"
}else if (str_detect(survey_omit$Departure.time.char[i], regex("^16:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Day"
}else if (str_detect(survey_omit$Departure.time.char[i], regex("^17:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Evening"
}else if (str_detect(survey_omit$Departure.time.char[i], regex("^18:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Evening"
}else if (str_detect(survey_omit$Departure.time.char[i], regex("^19:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Evening"
}else if (str_detect(survey_omit$Departure.time.char[i], regex("^20:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Night"
}else if (str_detect(survey_omit$Departure.time.char[i], regex("^21:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Night"
}else if (str_detect(survey_omit$Departure.time.char[i], regex("^22:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Night"
}else if (str_detect(survey_omit$Departure.time.char[i], regex("^23:", ignore_case = TRUE))) {survey_omit$Departure.time.bin[i] <- "Night"
}else {survey_omit$Departure.time.bin[i] <- NA}
}
survey_omit$Departure.time <- survey_omit$Departure.time.bin
survey_omit$Departure.time.char <- NULL
survey_omit$Departure.time.bin <- NULL
survey_omit$Departure.time <- as.factor(survey_omit$Departure.time)
head(survey_imputed)
## Quarter Date.recorded Departure.time
## 1 3Q16 09/04/2016 Morning
## 2 2Q16 05/01/2016 Day
## 3 2Q16 04/07/2016 Morning
## 4 3Q16 09/02/2016 Evening
## 5 3Q16 08/04/2016 Early Morning
## 6 3Q16 08/02/2016 Morning
## Ground.transportation.to.from.airport Parking.facilities
## 1 0 0
## 2 0 0
## 3 4 4
## 4 0 0
## 5 5 0
## 6 5 5
## Parking.facilities..value.for.money. Availability.of.baggage.carts
## 1 0 0
## 2 0 0
## 3 4 5
## 4 0 0
## 5 0 0
## 6 2 0
## Efficiency.of.check.in.staff Check.in.wait.time
## 1 5 0
## 2 5 0
## 3 5 5
## 4 4 0
## 5 4 4
## 6 4 5
## Courtesy.of.of.check.in.staff Wait.time.at.passport.inspection
## 1 0 3
## 2 0 2
## 3 5 5
## 4 0 3
## 5 4 5
## 6 5 0
## Courtesy.of.inspection.staff Courtesy.of.security.staff
## 1 4 4
## 2 3 3
## 3 5 5
## 4 3 4
## 5 5 2
## 6 0 5
## Thoroughness.of.security.inspection Wait.time.of.security.inspection
## 1 5 2
## 2 0 2
## 3 5 5
## 4 4 2
## 5 3 2
## 6 5 5
## Feeling.of.safety.and.security
## 1 4
## 2 3
## 3 5
## 4 3
## 5 3
## 6 5
## Ease.of.finding.your.way.through.the.airport Flight.information.screens
## 1 5 5
## 2 5 5
## 3 0 5
## 4 4 4
## 5 4 3
## 6 5 5
## Walking.distance.inside.terminal Ease.of.making.connections
## 1 5 0
## 2 4 0
## 3 0 0
## 4 4 0
## 5 5 0
## 6 5 0
## Courtesy.of.airport.staff Restaurants Restaurants..value.for.money.
## 1 0 0 0
## 2 0 4 3
## 3 5 5 5
## 4 0 0 2
## 5 4 4 4
## 6 5 5 5
## Availability.of.banks.ATM.money.changing Shopping.facilities
## 1 0 0
## 2 0 0
## 3 0 5
## 4 0 0
## 5 3 4
## 6 0 5
## Shopping.facilities..value.for.money. Internet.access
## 1 0 0
## 2 0 4
## 3 0 0
## 4 0 0
## 5 3 2
## 6 5 5
## Business.executive.lounges Availability.of.washrooms
## 1 0 4
## 2 0 0
## 3 0 5
## 4 0 4
## 5 2 4
## 6 5 5
## Cleanliness.of.washrooms Comfort.of.waiting.gate.areas
## 1 0 4
## 2 0 4
## 3 5 5
## 4 4 4
## 5 4 2
## 6 5 5
## Cleanliness.of.airport.terminal Ambience.of.airport
## 1 5 4
## 2 4 4
## 3 5 5
## 4 4 4
## 5 5 4
## 6 5 5
## Arrivals.passport.and.visa.inspection Speed.of.baggage.delivery
## 1 4 0
## 2 4 0
## 3 5 0
## 4 4 0
## 5 4 0
## 6 5 0
## Customs.inspection Overall.satisfaction
## 1 0 0
## 2 0 0
## 3 5 0
## 4 0 0
## 5 4 0
## 6 5 0
head(survey_omit)
## Quarter Date.recorded Departure.time
## 1 3Q16 09/04/2016 Morning
## 2 2Q16 05/01/2016 Day
## 5 3Q16 08/04/2016 Early Morning
## 6 3Q16 08/02/2016 Morning
## 7 2Q16 05/06/2016 Evening
## 13 3Q16 07/11/2016 Evening
## Ground.transportation.to.from.airport Parking.facilities
## 1 0 0
## 2 0 0
## 5 5 0
## 6 5 5
## 7 2 3
## 13 5 0
## Parking.facilities..value.for.money. Availability.of.baggage.carts
## 1 0 0
## 2 0 0
## 5 0 0
## 6 2 0
## 7 3 3
## 13 0 0
## Efficiency.of.check.in.staff Check.in.wait.time
## 1 5 0
## 2 5 0
## 5 4 4
## 6 4 5
## 7 5 5
## 13 0 0
## Courtesy.of.of.check.in.staff Wait.time.at.passport.inspection
## 1 0 3
## 2 0 2
## 5 4 5
## 6 5 0
## 7 5 5
## 13 0 1
## Courtesy.of.inspection.staff Courtesy.of.security.staff
## 1 4 4
## 2 3 3
## 5 5 2
## 6 0 5
## 7 5 4
## 13 1 5
## Thoroughness.of.security.inspection Wait.time.of.security.inspection
## 1 5 2
## 2 0 2
## 5 3 2
## 6 5 5
## 7 4 4
## 13 5 4
## Feeling.of.safety.and.security
## 1 4
## 2 3
## 5 3
## 6 5
## 7 3
## 13 0
## Ease.of.finding.your.way.through.the.airport Flight.information.screens
## 1 5 5
## 2 5 5
## 5 4 3
## 6 5 5
## 7 5 5
## 13 5 5
## Walking.distance.inside.terminal Ease.of.making.connections
## 1 5 0
## 2 4 0
## 5 5 0
## 6 5 0
## 7 4 0
## 13 3 0
## Courtesy.of.airport.staff Restaurants Restaurants..value.for.money.
## 1 0 0 0
## 2 0 4 3
## 5 4 4 4
## 6 5 5 5
## 7 4 4 3
## 13 0 0 0
## Availability.of.banks.ATM.money.changing Shopping.facilities
## 1 0 0
## 2 0 0
## 5 3 4
## 6 0 5
## 7 0 0
## 13 0 0
## Shopping.facilities..value.for.money. Internet.access
## 1 0 0
## 2 0 4
## 5 3 2
## 6 5 5
## 7 0 0
## 13 0 0
## Business.executive.lounges Availability.of.washrooms
## 1 0 4
## 2 0 0
## 5 2 4
## 6 5 5
## 7 0 4
## 13 0 0
## Cleanliness.of.washrooms Comfort.of.waiting.gate.areas
## 1 0 4
## 2 0 4
## 5 4 2
## 6 5 5
## 7 4 4
## 13 0 4
## Cleanliness.of.airport.terminal Ambience.of.airport
## 1 5 4
## 2 4 4
## 5 5 4
## 6 5 5
## 7 4 4
## 13 5 4
## Arrivals.passport.and.visa.inspection Speed.of.baggage.delivery
## 1 4 0
## 2 4 0
## 5 4 0
## 6 5 0
## 7 4 0
## 13 3 0
## Customs.inspection Overall.satisfaction
## 1 0 0
## 2 0 0
## 5 4 0
## 6 5 0
## 7 4 0
## 13 5 0
The correlation plot for both the datasets were similar and looked like the one given below.
library(corrplot)
## corrplot 0.84 loaded
corrplot(cor(survey_imputed[,4:36]), method = 'color', tl.cex = 0.3) #Imputed
corrplot(cor(survey_omit[,4:36]), method = "color", tl.cex = 0.3) #Omitted
Checking our target variable
summary(as.factor(survey_imputed$Overall.satisfaction))
## 0 1 2 3 4 5
## 2036 1 10 134 546 707
summary(as.factor(survey_omit$Overall.satisfaction))
## 0 1 2 3 4 5
## 1508 1 5 81 408 529
#survey_omit[(survey_omit$Overall.satisfaction == 0),]
As we can see the data is skewed towards zero. While further investigating the survey responses which had overall satisfaction “0”, we came across many instaces where the respondents had given satisfaction ratings as “4” and “5” to individual services. This could mean multiple things, first being the respondents didn’t fill in the overall satisfaction. Second being the overall satisfaction being actaully “0”.We are assuming that the overall satisfaction filled by the respondents is “0” and will futhre analyze this data based on this assumption.
df2 <- na.omit(survey.df)
library(stringr)
df2$Departure.time.char <- as.character(df2$Departure.time)
for (i in 1:nrow(df2)) {
if (str_detect(df2$Departure.time.char[i], regex(".am", ignore_case = TRUE)))
{
df2$Quarter.new[i] <- as.character(df2$Quarter[i])
} else if (str_detect(df2$Departure.time.char[i], regex(".pm", ignore_case = TRUE)))
{
df2$Quarter.new[i] <- as.character(df2$Quarter[i])
} else {df2$Quarter.new <- NA}
}
df2$Quarter.new <- as.factor(df2$Quarter.new)
summary(df2$Quarter.new)
## 1Q17 2Q17 4Q16 NA's
## 266 237 252 1777
From the following result we can say that there was a change in time format, after the 3rd quater of 2016 while recording the responses
Plotting Distribution of each services present at the Airport
for (i in 4:36){
hist(survey_omit[,i], xlab = "Satisfaction Levels", main = names(survey_omit[i]))
}
plot(aggregate(survey_omit$Overall.satisfaction ~ survey_omit$Quarter, data=survey_omit, mean))
plot(aggregate(survey_imputed$Overall.satisfaction ~ survey_imputed$Quarter, data=survey_imputed, mean))
The Overall satisfaction aggregated by Quarters shows that for some Qs, the overall sat is 0 while for others it is almost 5. 2Q17, 1Q17, 4Q16, and 1Q15 have average overall satisfaction as 5 while other quarters have average overall satisfaction of 0
This shows that the overall satisfaction is highly skewed when it comes to Quarters. This could be because of seasonal repairs or holiday seasons. But we do not have any data to support these claims. One high possibility for this could also be due to system defaulting unfilled data to 0s. But the occurrence of NAs in these Quarters hints otherwise. So, due to the inconclusive nature of this data, we decided to drop this as a predictor for the Overall Satisafaction. Including this as a predictor will cause the model to levy high and undue importance to this feature.
head(survey_imputed)
## Quarter Date.recorded Departure.time
## 1 3Q16 09/04/2016 Morning
## 2 2Q16 05/01/2016 Day
## 3 2Q16 04/07/2016 Morning
## 4 3Q16 09/02/2016 Evening
## 5 3Q16 08/04/2016 Early Morning
## 6 3Q16 08/02/2016 Morning
## Ground.transportation.to.from.airport Parking.facilities
## 1 0 0
## 2 0 0
## 3 4 4
## 4 0 0
## 5 5 0
## 6 5 5
## Parking.facilities..value.for.money. Availability.of.baggage.carts
## 1 0 0
## 2 0 0
## 3 4 5
## 4 0 0
## 5 0 0
## 6 2 0
## Efficiency.of.check.in.staff Check.in.wait.time
## 1 5 0
## 2 5 0
## 3 5 5
## 4 4 0
## 5 4 4
## 6 4 5
## Courtesy.of.of.check.in.staff Wait.time.at.passport.inspection
## 1 0 3
## 2 0 2
## 3 5 5
## 4 0 3
## 5 4 5
## 6 5 0
## Courtesy.of.inspection.staff Courtesy.of.security.staff
## 1 4 4
## 2 3 3
## 3 5 5
## 4 3 4
## 5 5 2
## 6 0 5
## Thoroughness.of.security.inspection Wait.time.of.security.inspection
## 1 5 2
## 2 0 2
## 3 5 5
## 4 4 2
## 5 3 2
## 6 5 5
## Feeling.of.safety.and.security
## 1 4
## 2 3
## 3 5
## 4 3
## 5 3
## 6 5
## Ease.of.finding.your.way.through.the.airport Flight.information.screens
## 1 5 5
## 2 5 5
## 3 0 5
## 4 4 4
## 5 4 3
## 6 5 5
## Walking.distance.inside.terminal Ease.of.making.connections
## 1 5 0
## 2 4 0
## 3 0 0
## 4 4 0
## 5 5 0
## 6 5 0
## Courtesy.of.airport.staff Restaurants Restaurants..value.for.money.
## 1 0 0 0
## 2 0 4 3
## 3 5 5 5
## 4 0 0 2
## 5 4 4 4
## 6 5 5 5
## Availability.of.banks.ATM.money.changing Shopping.facilities
## 1 0 0
## 2 0 0
## 3 0 5
## 4 0 0
## 5 3 4
## 6 0 5
## Shopping.facilities..value.for.money. Internet.access
## 1 0 0
## 2 0 4
## 3 0 0
## 4 0 0
## 5 3 2
## 6 5 5
## Business.executive.lounges Availability.of.washrooms
## 1 0 4
## 2 0 0
## 3 0 5
## 4 0 4
## 5 2 4
## 6 5 5
## Cleanliness.of.washrooms Comfort.of.waiting.gate.areas
## 1 0 4
## 2 0 4
## 3 5 5
## 4 4 4
## 5 4 2
## 6 5 5
## Cleanliness.of.airport.terminal Ambience.of.airport
## 1 5 4
## 2 4 4
## 3 5 5
## 4 4 4
## 5 5 4
## 6 5 5
## Arrivals.passport.and.visa.inspection Speed.of.baggage.delivery
## 1 4 0
## 2 4 0
## 3 5 0
## 4 4 0
## 5 4 0
## 6 5 0
## Customs.inspection Overall.satisfaction
## 1 0 0
## 2 0 0
## 3 5 0
## 4 0 0
## 5 4 0
## 6 5 0
survey_imputed$Departure.time <- as.factor(survey_imputed$Departure.time)
sur_sc <- data.frame(scale(survey_imputed[,4:36], center = TRUE, scale = TRUE))
sur_sc$Overall.satisfaction = survey_imputed$Overall.satisfaction
print("MEAN")
## [1] "MEAN"
apply(sur_sc[1:33],2,mean)
## Ground.transportation.to.from.airport
## 6.775815e-17
## Parking.facilities
## -8.442936e-17
## Parking.facilities..value.for.money.
## 4.408046e-17
## Availability.of.baggage.carts
## 7.381137e-17
## Efficiency.of.check.in.staff
## 1.734218e-18
## Check.in.wait.time
## 9.137336e-17
## Courtesy.of.of.check.in.staff
## -3.485663e-17
## Wait.time.at.passport.inspection
## 3.019399e-17
## Courtesy.of.inspection.staff
## 8.707463e-17
## Courtesy.of.security.staff
## 1.100621e-16
## Thoroughness.of.security.inspection
## 1.011253e-16
## Wait.time.of.security.inspection
## -1.846987e-16
## Feeling.of.safety.and.security
## -5.348140e-17
## Ease.of.finding.your.way.through.the.airport
## -2.956977e-17
## Flight.information.screens
## -2.732491e-16
## Walking.distance.inside.terminal
## -1.681233e-16
## Ease.of.making.connections
## 3.766217e-17
## Courtesy.of.airport.staff
## -1.985491e-17
## Restaurants
## -7.437384e-17
## Restaurants..value.for.money.
## -1.090117e-16
## Availability.of.banks.ATM.money.changing
## 1.423198e-17
## Shopping.facilities
## 2.392596e-17
## Shopping.facilities..value.for.money.
## -1.481193e-17
## Internet.access
## 2.330427e-17
## Business.executive.lounges
## 3.975873e-17
## Availability.of.washrooms
## -9.299667e-17
## Cleanliness.of.washrooms
## -9.809902e-17
## Comfort.of.waiting.gate.areas
## 4.206730e-19
## Cleanliness.of.airport.terminal
## -4.046136e-16
## Ambience.of.airport
## -1.120128e-16
## Arrivals.passport.and.visa.inspection
## 4.037103e-17
## Speed.of.baggage.delivery
## 1.039331e-17
## Customs.inspection
## 2.466383e-17
print("STANDARD DEVIATION")
## [1] "STANDARD DEVIATION"
apply(sur_sc[1:33],2,sd)
## Ground.transportation.to.from.airport
## 1
## Parking.facilities
## 1
## Parking.facilities..value.for.money.
## 1
## Availability.of.baggage.carts
## 1
## Efficiency.of.check.in.staff
## 1
## Check.in.wait.time
## 1
## Courtesy.of.of.check.in.staff
## 1
## Wait.time.at.passport.inspection
## 1
## Courtesy.of.inspection.staff
## 1
## Courtesy.of.security.staff
## 1
## Thoroughness.of.security.inspection
## 1
## Wait.time.of.security.inspection
## 1
## Feeling.of.safety.and.security
## 1
## Ease.of.finding.your.way.through.the.airport
## 1
## Flight.information.screens
## 1
## Walking.distance.inside.terminal
## 1
## Ease.of.making.connections
## 1
## Courtesy.of.airport.staff
## 1
## Restaurants
## 1
## Restaurants..value.for.money.
## 1
## Availability.of.banks.ATM.money.changing
## 1
## Shopping.facilities
## 1
## Shopping.facilities..value.for.money.
## 1
## Internet.access
## 1
## Business.executive.lounges
## 1
## Availability.of.washrooms
## 1
## Cleanliness.of.washrooms
## 1
## Comfort.of.waiting.gate.areas
## 1
## Cleanliness.of.airport.terminal
## 1
## Ambience.of.airport
## 1
## Arrivals.passport.and.visa.inspection
## 1
## Speed.of.baggage.delivery
## 1
## Customs.inspection
## 1
my.pca <- prcomp(sur_sc[,1:33])
summary(my.pca)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6
## Standard deviation 2.4658 1.8448 1.64178 1.40541 1.3679 1.31664
## Proportion of Variance 0.1842 0.1031 0.08168 0.05985 0.0567 0.05253
## Cumulative Proportion 0.1842 0.2874 0.36906 0.42892 0.4856 0.53815
## PC7 PC8 PC9 PC10 PC11 PC12
## Standard deviation 1.19923 1.15935 1.11358 1.01213 0.9950 0.95811
## Proportion of Variance 0.04358 0.04073 0.03758 0.03104 0.0300 0.02782
## Cumulative Proportion 0.58173 0.62246 0.66004 0.69108 0.7211 0.74890
## PC13 PC14 PC15 PC16 PC17 PC18
## Standard deviation 0.91171 0.8731 0.8423 0.79820 0.77573 0.73189
## Proportion of Variance 0.02519 0.0231 0.0215 0.01931 0.01824 0.01623
## Cumulative Proportion 0.77408 0.7972 0.8187 0.83799 0.85623 0.87246
## PC19 PC20 PC21 PC22 PC23 PC24
## Standard deviation 0.70936 0.70165 0.68390 0.64808 0.60025 0.57900
## Proportion of Variance 0.01525 0.01492 0.01417 0.01273 0.01092 0.01016
## Cumulative Proportion 0.88771 0.90263 0.91680 0.92953 0.94044 0.95060
## PC25 PC26 PC27 PC28 PC29 PC30
## Standard deviation 0.56753 0.51751 0.47409 0.43400 0.41887 0.36757
## Proportion of Variance 0.00976 0.00812 0.00681 0.00571 0.00532 0.00409
## Cumulative Proportion 0.96036 0.96848 0.97529 0.98100 0.98631 0.99041
## PC31 PC32 PC33
## Standard deviation 0.35444 0.32215 0.29515
## Proportion of Variance 0.00381 0.00314 0.00264
## Cumulative Proportion 0.99422 0.99736 1.00000
library(factoextra)
## Loading required package: ggplot2
## Welcome! Related Books: `Practical Guide To Cluster Analysis in R` at https://goo.gl/13EFCZ
sur.agg <- aggregate(sur_sc[, 1:33], list(sur_sc$Overall.satisfaction), mean)
sur.agg
## Group.1 Ground.transportation.to.from.airport Parking.facilities
## 1 0 0.013166801 0.001845229
## 2 1 -0.543213638 -0.611255614
## 3 2 -0.039547660 -0.235754401
## 4 3 -0.081918475 -0.082831733
## 5 4 -0.023278729 -0.021771764
## 6 5 -0.003085792 0.031398512
## Parking.facilities..value.for.money. Availability.of.baggage.carts
## 1 -0.00804023 -0.007157468
## 2 -0.59951807 1.673061392
## 3 -0.07313439 0.214722884
## 4 -0.05392968 -0.026380152
## 5 -0.03285758 -0.109283853
## 6 0.06063308 0.104605713
## Efficiency.of.check.in.staff Check.in.wait.time
## 1 -0.03562549 -0.02643162
## 2 -0.45338318 0.12254995
## 3 0.01363379 -0.10899697
## 4 -0.39674866 -0.43471783
## 5 -0.12621469 -0.11175348
## 6 0.27571156 0.24618356
## Courtesy.of.of.check.in.staff Wait.time.at.passport.inspection
## 1 -0.01692423 -0.06188028
## 2 0.12717674 0.32881176
## 3 -0.09678476 0.17574268
## 4 -0.43272701 -0.28422610
## 5 -0.16405525 0.03257917
## 6 0.25863929 0.20396051
## Courtesy.of.inspection.staff Courtesy.of.security.staff
## 1 -0.04540544 -0.02816605
## 2 0.28224017 0.02999378
## 3 -0.03192928 -0.60061024
## 4 -0.25309831 -0.60270179
## 5 -0.02655886 -0.16506485
## 6 0.19929114 0.33127246
## Thoroughness.of.security.inspection Wait.time.of.security.inspection
## 1 -0.01993093 -0.04758989
## 2 -0.06672955 -0.01418078
## 3 -0.53920229 -0.78714605
## 4 -0.61324653 -0.61986252
## 5 -0.16047414 -0.10761615
## 6 0.30527869 0.34879583
## Feeling.of.safety.and.security
## 1 -0.02682494
## 2 -0.15948523
## 3 -0.32569836
## 4 -0.58742202
## 5 -0.17775041
## 6 0.33069070
## Ease.of.finding.your.way.through.the.airport Flight.information.screens
## 1 0.0003949052 -0.008624559
## 2 -4.0844841290 -0.173149770
## 3 -1.4095195728 -0.620775600
## 4 -0.7863448319 -0.418119130
## 5 -0.1842929342 -0.115761843
## 6 0.3159403146 0.202509814
## Walking.distance.inside.terminal Ease.of.making.connections
## 1 -0.01872067 -0.11990041
## 2 -0.44235061 -0.28523955
## 3 -1.54389667 -0.03892456
## 4 -0.83693427 0.14979439
## 5 -0.15990291 0.08017280
## 6 0.35849051 0.25593342
## Courtesy.of.airport.staff Restaurants Restaurants..value.for.money.
## 1 0.009178752 -0.0001026923 0.006636042
## 2 -1.385210770 0.5268619239 0.799912152
## 3 -0.045565117 -0.4930049570 -0.356097165
## 4 -0.493446649 -0.2228163430 -0.247644266
## 5 -0.213388858 -0.1166255128 -0.125621210
## 6 0.234490933 0.1388220499 0.128746239
## Availability.of.banks.ATM.money.changing Shopping.facilities
## 1 0.0164804314 0.01111814
## 2 1.2487263830 0.55683549
## 3 0.2428184378 -0.18952402
## 4 -0.0406727390 -0.16353140
## 5 -0.0593224102 -0.13302306
## 6 0.0008615471 0.10360067
## Shopping.facilities..value.for.money. Internet.access
## 1 0.01352222 0.0006782325
## 2 0.81865530 1.0757406225
## 3 -0.13147311 0.0012813531
## 4 -0.15733262 -0.2110140818
## 5 -0.10874864 -0.0543813720
## 6 0.07556460 0.0784988408
## Business.executive.lounges Availability.of.washrooms
## 1 0.010035796 -0.01217199
## 2 1.900912708 -0.63086746
## 3 0.385829909 -1.04936048
## 4 0.204924500 -0.55799554
## 5 -0.091448921 -0.17098502
## 6 -0.005262893 0.28859386
## Cleanliness.of.washrooms Comfort.of.waiting.gate.areas
## 1 -0.02129018 0.004792325
## 2 -0.52448293 -0.975717379
## 3 -1.04919505 -1.561147806
## 4 -0.49022001 -0.982998852
## 5 -0.13767591 -0.312729929
## 6 0.27612981 0.437485727
## Cleanliness.of.airport.terminal Ambience.of.airport
## 1 -0.008228918 -0.01580088
## 2 -1.644644511 -1.41109555
## 3 -1.405638618 -1.75322562
## 4 -1.127392952 -1.13024252
## 5 -0.300783538 -0.38136343
## 6 0.491872099 0.58103376
## Arrivals.passport.and.visa.inspection Speed.of.baggage.delivery
## 1 0.7515050 -0.5028502
## 2 -1.2237881 1.8107023
## 3 -0.8585528 0.5281696
## 4 -1.0091442 0.5098974
## 5 -1.0941831 0.5849718
## 6 -1.1140108 0.8896601
## Customs.inspection
## 1 0.4282349
## 2 -0.6954321
## 3 -0.4355155
## 4 -0.6023276
## 5 -0.6364034
## 6 -0.6204349
sur.mean.sc <- data.frame(scale(sur.agg[,2:33], center = TRUE, scale = TRUE))
print(sur.mean.sc)
## Ground.transportation.to.from.airport Parking.facilities
## 1 0.5913909 0.6356218
## 2 -2.0169942 -1.8800852
## 3 0.3442586 -0.3393093
## 4 0.1456186 0.2881709
## 5 0.4205295 0.5387154
## 6 0.5151966 0.7568864
## Parking.facilities..value.for.money. Availability.of.baggage.carts
## 1 0.4563933 -0.4651890
## 2 -2.0028623 2.0128470
## 3 0.1857438 -0.1379534
## 4 0.2655935 -0.4935392
## 5 0.3532074 -0.6158080
## 6 0.7419243 -0.3003574
## Efficiency.of.check.in.staff Check.in.wait.time
## 1 0.31205617 0.1101886
## 2 -1.22503081 0.7473897
## 3 0.49329951 -0.2429472
## 4 -1.01665120 -1.6360705
## 5 -0.02125545 -0.2547370
## 6 1.45758177 1.2761764
## Courtesy.of.of.check.in.staff Wait.time.at.passport.inspection
## 1 0.1545975 -0.5825872
## 2 0.7536472 1.1996489
## 3 -0.1773950 0.5013873
## 4 -1.5739582 -1.5968713
## 5 -0.4570487 -0.1516876
## 6 1.3001571 0.6301098
## Courtesy.of.inspection.staff Courtesy.of.security.staff
## 1 -0.3442981 0.39035044
## 2 1.3607252 0.54759318
## 3 -0.2741700 -1.15732751
## 4 -1.4251036 -1.16298231
## 5 -0.2462231 0.02022659
## 6 0.9290697 1.36213961
## Thoroughness.of.security.inspection Wait.time.of.security.inspection
## 1 0.47250980 0.3724913
## 2 0.33639177 0.4517512
## 3 -1.03783786 -1.3820350
## 4 -1.25320221 -0.9851708
## 5 0.06372716 0.2300847
## 6 1.41841133 1.3128785
## Feeling.of.safety.and.security
## 1 0.427505956
## 2 -0.005671361
## 3 -0.548409218
## 4 -1.403018828
## 5 -0.065312884
## 6 1.594906335
## Ease.of.finding.your.way.through.the.airport Flight.information.screens
## 1 0.6327953 0.61477584
## 2 -1.8887736 0.05398164
## 3 -0.2375357 -1.47177824
## 4 0.1471460 -0.78101133
## 5 0.5187887 0.24959189
## 6 0.8275793 1.33444020
## Walking.distance.inside.terminal Ease.of.making.connections
## 1 0.626099413 -0.6485807
## 2 -0.002668923 -1.4938016
## 3 -1.637627203 -0.2346287
## 4 -0.588325620 0.7301108
## 5 0.416551149 0.3742022
## 6 1.185971184 1.2726980
## Courtesy.of.airport.staff Restaurants Restaurants..value.for.money.
## 1 0.5625514 0.08008581 -0.06701729
## 2 -1.8522564 1.60318522 1.85320820
## 3 0.4677458 -1.34456231 -0.94505905
## 4 -0.3078967 -0.56362919 -0.68253504
## 5 0.1771082 -0.25670310 -0.38716275
## 6 0.9527477 0.48162357 0.22856594
## Availability.of.banks.ATM.money.changing Shopping.facilities
## 1 -0.42936481 -0.07037704
## 2 1.99389957 1.86985781
## 3 0.01573856 -0.78373701
## 4 -0.54175896 -0.69132326
## 5 -0.57843434 -0.58285438
## 6 -0.46008001 0.25843389
## Shopping.facilities..value.for.money. Internet.access
## 1 -0.19297586 -0.3181960
## 2 1.97977281 1.9964587
## 3 -0.58426322 -0.3168975
## 4 -0.65404823 -0.7739785
## 5 -0.52293851 -0.4367417
## 6 -0.02554699 -0.1506450
## Business.executive.lounges Availability.of.washrooms
## 1 -0.51772901 0.7111755
## 2 1.98731633 -0.5705337
## 3 -0.01987466 -1.4374971
## 4 -0.25953926 -0.4195699
## 5 -0.65217659 0.3821734
## 6 -0.53799681 1.3342516
## Cleanliness.of.washrooms Comfort.of.waiting.gate.areas
## 1 0.6522375 0.7713291
## 2 -0.4303433 -0.5558663
## 3 -1.5592213 -1.3482915
## 4 -0.3566292 -0.5657224
## 5 0.4018425 0.3415382
## 6 1.2921138 1.3570129
## Cleanliness.of.airport.terminal Ambience.of.airport
## 1 0.7723444 0.7474366
## 2 -1.1496854 -0.8107155
## 3 -0.8689642 -1.1927787
## 4 -0.5421545 -0.4970816
## 5 0.4287284 0.3392059
## 6 1.3597313 1.4139332
## Arrivals.passport.and.visa.inspection Speed.of.baggage.delivery
## 1 2.0141409 -1.53070971
## 2 -0.6214530 1.57682828
## 3 -0.1341269 -0.14585541
## 4 -0.3350580 -0.17039839
## 5 -0.4485236 -0.06955939
## 6 -0.4749794 0.33969463
mean.pca <- prcomp(sur.mean.sc)
summary(mean.pca)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6
## Standard deviation 3.9059 3.5682 1.50550 1.27027 0.36331 6.919e-16
## Proportion of Variance 0.4768 0.3979 0.07083 0.05042 0.00412 0.000e+00
## Cumulative Proportion 0.4768 0.8746 0.94545 0.99588 1.00000 1.000e+00
fviz_eig(mean.pca, type=c("barplot", "lines"))
screeplot(mean.pca, type='line')
biplot(mean.pca, col = 'purple', cex=0.5, expand=1)
summary(mean.pca)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6
## Standard deviation 3.9059 3.5682 1.50550 1.27027 0.36331 6.919e-16
## Proportion of Variance 0.4768 0.3979 0.07083 0.05042 0.00412 0.000e+00
## Cumulative Proportion 0.4768 0.8746 0.94545 0.99588 1.00000 1.000e+00
fviz_eig(mean.pca, type=c("barplot", "lines"))
biplot(mean.pca, col = 'purple', cex=0.5, expand=1)
The Scree plot shows 2 components are enough to explain 87% variance of the data but since the features are highly associated we want to dig further in order to asses the real drivers impacting the Overall Satisfaction and provide recommendation
We tired 3 methods of understanding what impacts the overall satisfactions of the passengers + Linear Modeling + Relative Importance + Random Forest for feature importance
All these show us the same kind of results and hence make us sure about our recommendations.
set.seed(27705)
drop <- c("Date.recorded")
survey_without_date <- survey_imputed[,!(names(survey_imputed) %in% drop)]
survey_without_date[3:36]<-data.frame(scale(survey_without_date[3:36]))
model <- lm(formula= Overall.satisfaction~0+. , data=survey_without_date[3:36])
#head(survey_without_date[3:36])
summary(model)
##
## Call:
## lm(formula = Overall.satisfaction ~ 0 + ., data = survey_without_date[3:36])
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.34012 -0.16777 -0.04877 0.10972 2.50513
##
## Coefficients:
## Estimate Std. Error t value
## Ground.transportation.to.from.airport 0.0010338 0.0068947 0.150
## Parking.facilities 0.0058597 0.0156369 0.375
## Parking.facilities..value.for.money. -0.0041471 0.0156867 -0.264
## Availability.of.baggage.carts -0.0012865 0.0076643 -0.168
## Efficiency.of.check.in.staff 0.0156076 0.0123877 1.260
## Check.in.wait.time 0.0040762 0.0141078 0.289
## Courtesy.of.of.check.in.staff -0.0088048 0.0145479 -0.605
## Wait.time.at.passport.inspection 0.0114993 0.0142199 0.809
## Courtesy.of.inspection.staff 0.0033963 0.0143566 0.237
## Courtesy.of.security.staff 0.0167158 0.0093947 1.779
## Thoroughness.of.security.inspection -0.0068994 0.0107506 -0.642
## Wait.time.of.security.inspection 0.0117928 0.0102965 1.145
## Feeling.of.safety.and.security 0.0194412 0.0096302 2.019
## Ease.of.finding.your.way.through.the.airport 0.0050328 0.0080628 0.624
## Flight.information.screens -0.0013889 0.0072563 -0.191
## Walking.distance.inside.terminal 0.0283690 0.0080395 3.529
## Ease.of.making.connections 0.0314559 0.0069883 4.501
## Courtesy.of.airport.staff 0.0138858 0.0071778 1.935
## Restaurants -0.0069938 0.0117292 -0.596
## Restaurants..value.for.money. 0.0008908 0.0118580 0.075
## Availability.of.banks.ATM.money.changing -0.0031834 0.0080316 -0.396
## Shopping.facilities 0.0049635 0.0123461 0.402
## Shopping.facilities..value.for.money. -0.0003346 0.0127314 -0.026
## Internet.access 0.0043269 0.0068896 0.628
## Business.executive.lounges -0.0079730 0.0077794 -1.025
## Availability.of.washrooms 0.0048031 0.0103291 0.465
## Cleanliness.of.washrooms 0.0107972 0.0104092 1.037
## Comfort.of.waiting.gate.areas 0.0199827 0.0083003 2.407
## Cleanliness.of.airport.terminal 0.0398335 0.0090232 4.415
## Ambience.of.airport 0.1239953 0.0090130 13.757
## Arrivals.passport.and.visa.inspection -0.8314021 0.0089837 -92.546
## Speed.of.baggage.delivery 0.1167842 0.0081056 14.408
## Customs.inspection -0.0338725 0.0080715 -4.197
## Pr(>|t|)
## Ground.transportation.to.from.airport 0.880818
## Parking.facilities 0.707880
## Parking.facilities..value.for.money. 0.791510
## Availability.of.baggage.carts 0.866705
## Efficiency.of.check.in.staff 0.207782
## Check.in.wait.time 0.772648
## Courtesy.of.of.check.in.staff 0.545067
## Wait.time.at.passport.inspection 0.418757
## Courtesy.of.inspection.staff 0.813008
## Courtesy.of.security.staff 0.075282 .
## Thoroughness.of.security.inspection 0.521065
## Wait.time.of.security.inspection 0.252157
## Feeling.of.safety.and.security 0.043589 *
## Ease.of.finding.your.way.through.the.airport 0.532538
## Flight.information.screens 0.848215
## Walking.distance.inside.terminal 0.000423 ***
## Ease.of.making.connections 6.98e-06 ***
## Courtesy.of.airport.staff 0.053128 .
## Restaurants 0.551031
## Restaurants..value.for.money. 0.940119
## Availability.of.banks.ATM.money.changing 0.691860
## Shopping.facilities 0.687688
## Shopping.facilities..value.for.money. 0.979033
## Internet.access 0.530025
## Business.executive.lounges 0.305484
## Availability.of.washrooms 0.641954
## Cleanliness.of.washrooms 0.299678
## Comfort.of.waiting.gate.areas 0.016117 *
## Cleanliness.of.airport.terminal 1.04e-05 ***
## Ambience.of.airport < 2e-16 ***
## Arrivals.passport.and.visa.inspection < 2e-16 ***
## Speed.of.baggage.delivery < 2e-16 ***
## Customs.inspection 2.78e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3748 on 3401 degrees of freedom
## Multiple R-squared: 0.8608, Adjusted R-squared: 0.8594
## F-statistic: 637.3 on 33 and 3401 DF, p-value: < 2.2e-16
Relative importance the percentage of importnace of the predictors on the target variable.
To find the relative importance of various predictors in the Overall Satisfaction, we need coefficents for the predictors towards the dependent variable i.e. Overall Satisfaction. The coefficients can be sourced from fitting models through a regression. We are using a linear model to obain these coefficients for all variables other than Quarter and Date Recorded, for the reasons mentioned before.
To get relative importance, we need to provide the intercept too. Hence, we have not eliminated the intercept from our model.
set.seed(27705)
model1 <- lm(Overall.satisfaction~.-Date.recorded-Quarter, data=survey_omit)
summary(model1)
##
## Call:
## lm(formula = Overall.satisfaction ~ . - Date.recorded - Quarter,
## data = survey_omit)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.0680 -0.3051 -0.0535 0.2705 5.0746
##
## Coefficients:
## Estimate Std. Error
## (Intercept) 0.9193608 0.1001718
## Departure.timeEarly Morning -0.0492286 0.0582559
## Departure.timeEvening 0.0943968 0.0378137
## Departure.timeMorning -0.0384857 0.0327338
## Departure.timeNight -0.0491610 0.0627197
## Ground.transportation.to.from.airport -0.0007028 0.0064613
## Parking.facilities 0.0053897 0.0185858
## Parking.facilities..value.for.money. 0.0078831 0.0201707
## Availability.of.baggage.carts -0.0101960 0.0090102
## Efficiency.of.check.in.staff 0.0203886 0.0148824
## Check.in.wait.time 0.0227398 0.0168559
## Courtesy.of.of.check.in.staff -0.0289482 0.0171779
## Wait.time.at.passport.inspection -0.0072406 0.0154614
## Courtesy.of.inspection.staff 0.0164229 0.0159676
## Courtesy.of.security.staff 0.0226648 0.0138249
## Thoroughness.of.security.inspection 0.0055238 0.0177721
## Wait.time.of.security.inspection 0.0233914 0.0165996
## Feeling.of.safety.and.security 0.0060408 0.0170105
## Ease.of.finding.your.way.through.the.airport 0.0005833 0.0203197
## Flight.information.screens 0.0042216 0.0112018
## Walking.distance.inside.terminal 0.1044387 0.0194173
## Ease.of.making.connections 0.0456396 0.0121427
## Courtesy.of.airport.staff 0.0117223 0.0078067
## Restaurants 0.0017916 0.0124829
## Restaurants..value.for.money. -0.0109063 0.0136850
## Availability.of.banks.ATM.money.changing -0.0124474 0.0097654
## Shopping.facilities -0.0002189 0.0131586
## Shopping.facilities..value.for.money. 0.0140442 0.0152849
## Internet.access 0.0030859 0.0072340
## Business.executive.lounges -0.0273259 0.0129352
## Availability.of.washrooms 0.0133165 0.0152375
## Cleanliness.of.washrooms 0.0140168 0.0142325
## Comfort.of.waiting.gate.areas 0.0808887 0.0171749
## Cleanliness.of.airport.terminal 0.1453546 0.0227868
## Ambience.of.airport 0.2878309 0.0217436
## Arrivals.passport.and.visa.inspection -0.8848256 0.0083488
## Speed.of.baggage.delivery 0.1269883 0.0097345
## Customs.inspection -0.0264663 0.0087083
## t value Pr(>|t|)
## (Intercept) 9.178 < 2e-16 ***
## Departure.timeEarly Morning -0.845 0.398169
## Departure.timeEvening 2.496 0.012611 *
## Departure.timeMorning -1.176 0.239819
## Departure.timeNight -0.784 0.433220
## Ground.transportation.to.from.airport -0.109 0.913390
## Parking.facilities 0.290 0.771849
## Parking.facilities..value.for.money. 0.391 0.695965
## Availability.of.baggage.carts -1.132 0.257909
## Efficiency.of.check.in.staff 1.370 0.170817
## Check.in.wait.time 1.349 0.177437
## Courtesy.of.of.check.in.staff -1.685 0.092074 .
## Wait.time.at.passport.inspection -0.468 0.639608
## Courtesy.of.inspection.staff 1.029 0.303807
## Courtesy.of.security.staff 1.639 0.101253
## Thoroughness.of.security.inspection 0.311 0.755970
## Wait.time.of.security.inspection 1.409 0.158912
## Feeling.of.safety.and.security 0.355 0.722530
## Ease.of.finding.your.way.through.the.airport 0.029 0.977100
## Flight.information.screens 0.377 0.706306
## Walking.distance.inside.terminal 5.379 8.20e-08 ***
## Ease.of.making.connections 3.759 0.000175 ***
## Courtesy.of.airport.staff 1.502 0.133335
## Restaurants 0.144 0.885889
## Restaurants..value.for.money. -0.797 0.425554
## Availability.of.banks.ATM.money.changing -1.275 0.202555
## Shopping.facilities -0.017 0.986732
## Shopping.facilities..value.for.money. 0.919 0.358275
## Internet.access 0.427 0.669714
## Business.executive.lounges -2.113 0.034741 *
## Availability.of.washrooms 0.874 0.382241
## Cleanliness.of.washrooms 0.985 0.324795
## Comfort.of.waiting.gate.areas 4.710 2.62e-06 ***
## Cleanliness.of.airport.terminal 6.379 2.12e-10 ***
## Ambience.of.airport 13.238 < 2e-16 ***
## Arrivals.passport.and.visa.inspection -105.983 < 2e-16 ***
## Speed.of.baggage.delivery 13.045 < 2e-16 ***
## Customs.inspection -3.039 0.002397 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.667 on 2494 degrees of freedom
## Multiple R-squared: 0.9105, Adjusted R-squared: 0.9092
## F-statistic: 685.7 on 37 and 2494 DF, p-value: < 2.2e-16
From the model summary, we see that not all predictors have statistically significant relationship in predicting the Overall Satisfaction. Hence, we may not need to use their coefficents to get their relative importance.We train another linear model with the predictors that have statistically significant relation in predicting the Overall Satisfaction. We are not yet looking at the coefficients to eliminate or choose the predictors for this model. Even if their coefficients are lower, that will be reflected in the realtive importance.
The predictors we chose for this model are: Departure.time, Walking.distance.inside.terminal, Ease.of.making.connections, Business.executive.lounges, Comfort.of.waiting.gate.areas, Cleanliness.of.airport.terminal, Ambience.of.airport, Arrivals.passport.and.visa.inspection, Speed.of.baggage.delivery, Customs.inspection
fmla <- as.formula("Overall.satisfaction~Departure.time+Walking.distance.inside.terminal+Ease.of.making.connections+Business.executive.lounges+Comfort.of.waiting.gate.areas+Cleanliness.of.airport.terminal+Ambience.of.airport+Arrivals.passport.and.visa.inspection+Speed.of.baggage.delivery+Customs.inspection")
set.seed(27705)
model2 <- lm(formula=fmla, data=survey_omit)
summary(model2)
##
## Call:
## lm(formula = fmla, data = survey_omit)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.0984 -0.3064 -0.0555 0.2643 5.2093
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 1.067240 0.092293 11.564
## Departure.timeEarly Morning -0.065630 0.058156 -1.129
## Departure.timeEvening 0.077955 0.037813 2.062
## Departure.timeMorning -0.028875 0.032643 -0.885
## Departure.timeNight -0.054591 0.062657 -0.871
## Walking.distance.inside.terminal 0.129388 0.017261 7.496
## Ease.of.making.connections 0.024027 0.011322 2.122
## Business.executive.lounges -0.027009 0.011279 -2.395
## Comfort.of.waiting.gate.areas 0.106181 0.016202 6.554
## Cleanliness.of.airport.terminal 0.159379 0.022580 7.059
## Ambience.of.airport 0.301419 0.021572 13.973
## Arrivals.passport.and.visa.inspection -0.885748 0.008371 -105.808
## Speed.of.baggage.delivery 0.135735 0.009565 14.191
## Customs.inspection -0.020322 0.008446 -2.406
## Pr(>|t|)
## (Intercept) < 2e-16 ***
## Departure.timeEarly Morning 0.2592
## Departure.timeEvening 0.0393 *
## Departure.timeMorning 0.3765
## Departure.timeNight 0.3837
## Walking.distance.inside.terminal 9.06e-14 ***
## Ease.of.making.connections 0.0339 *
## Business.executive.lounges 0.0167 *
## Comfort.of.waiting.gate.areas 6.78e-11 ***
## Cleanliness.of.airport.terminal 2.17e-12 ***
## Ambience.of.airport < 2e-16 ***
## Arrivals.passport.and.visa.inspection < 2e-16 ***
## Speed.of.baggage.delivery < 2e-16 ***
## Customs.inspection 0.0162 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6716 on 2518 degrees of freedom
## Multiple R-squared: 0.9084, Adjusted R-squared: 0.9079
## F-statistic: 1920 on 13 and 2518 DF, p-value: < 2.2e-16
We have fit this model with the predictors. As the results show, this model explains almost 91% of the variance of Overall Satisfaction. We can use this limited number of predictors and their coefficients to get the relative importance towards Overall sat.
To get the relative importance, we have used The R package, relaimpo. Ulrike Grömping, who maintains the CRAN Task View for Design of Experiments, has written an R package called relaimpo.
“The R package, relaimpo, implements several reasonable procedures from the statistical literature to assign something that looks like a percent contribution to each correlated predictor.” source: www.r-bloggers.com
calc.relimp() calculates the realtive importance metrics for the linear model. We are using the method lmg which gives us sequential sum of R-squared partitioned by averaging over orders. This is another version of the metric Shapley Value Regression.
library(relaimpo)
## Loading required package: MASS
## Loading required package: boot
## Loading required package: survey
## Loading required package: Matrix
## Loading required package: survival
##
## Attaching package: 'survival'
## The following object is masked from 'package:boot':
##
## aml
##
## Attaching package: 'survey'
## The following object is masked from 'package:graphics':
##
## dotchart
## Loading required package: mitools
## This is the global version of package relaimpo.
## If you are a non-US user, a version with the interesting additional metric pmvd is available
## from Ulrike Groempings web site at prof.beuth-hochschule.de/groemping.
rel.imp <- calc.relimp(model2, type = c("lmg"), rela = TRUE)
rel.imp
## Response variable: Overall.satisfaction
## Total response variance: 4.897195
## Analysis based on 2532 observations
##
## 13 Regressors:
## Some regressors combined in groups:
## Group Departure.time : Departure.timeEarly Morning Departure.timeEvening Departure.timeMorning Departure.timeNight
##
## Relative importance of 10 (groups of) regressors assessed:
## Departure.time Walking.distance.inside.terminal Ease.of.making.connections Business.executive.lounges Comfort.of.waiting.gate.areas Cleanliness.of.airport.terminal Ambience.of.airport Arrivals.passport.and.visa.inspection Speed.of.baggage.delivery Customs.inspection
##
## Proportion of variance explained by model: 90.84%
## Metrics are normalized to sum to 100% (rela=TRUE).
##
## Relative importance metrics:
##
## lmg
## Departure.time 0.0052825848
## Walking.distance.inside.terminal 0.0036392182
## Ease.of.making.connections 0.0066377888
## Business.executive.lounges 0.0009527126
## Comfort.of.waiting.gate.areas 0.0047996977
## Cleanliness.of.airport.terminal 0.0073470857
## Ambience.of.airport 0.0114385209
## Arrivals.passport.and.visa.inspection 0.6681417591
## Speed.of.baggage.delivery 0.1733133758
## Customs.inspection 0.1184472564
##
## Average coefficients for different model sizes:
##
## 1group 2groups 3groups
## Departure.timeEarly Morning -0.52860577 -0.447038136 -0.37413052
## Departure.timeEvening -0.45040920 -0.373157079 -0.30016956
## Departure.timeMorning 0.19232677 0.132876462 0.08328254
## Departure.timeNight 0.03324125 0.007613281 -0.01250115
## Walking.distance.inside.terminal 0.16175345 0.155457726 0.14923250
## Ease.of.making.connections 0.24195928 0.204047918 0.16936312
## Business.executive.lounges -0.04616400 -0.039864042 -0.03486394
## Comfort.of.waiting.gate.areas 0.17087382 0.163808593 0.15596452
## Cleanliness.of.airport.terminal 0.25375250 0.252416940 0.24905747
## Ambience.of.airport 0.26342528 0.268934744 0.27302731
## Arrivals.passport.and.visa.inspection -0.92243302 -0.919697368 -0.91688553
## Speed.of.baggage.delivery 0.78366314 0.700634815 0.61799527
## Customs.inspection -0.59238255 -0.513579281 -0.43777934
## 4groups 5groups 6groups
## Departure.timeEarly Morning -0.30980087 -0.25358029 -0.204747479
## Departure.timeEvening -0.23209408 -0.16916964 -0.111309581
## Departure.timeMorning 0.04333447 0.01255354 -0.009692403
## Departure.timeNight -0.02786588 -0.03922293 -0.047200685
## Walking.distance.inside.terminal 0.14332986 0.13809733 0.133862138
## Ease.of.making.connections 0.13833202 0.11108081 0.087520216
## Business.executive.lounges -0.03100108 -0.02819378 -0.026381088
## Comfort.of.waiting.gate.areas 0.14757067 0.13897968 0.130590252
## Cleanliness.of.airport.terminal 0.24337208 0.23522000 0.224580358
## Ambience.of.airport 0.27624553 0.27911402 0.282101698
## Arrivals.passport.and.visa.inspection -0.91374543 -0.91013886 -0.906024936
## Speed.of.baggage.delivery 0.53747739 0.46018876 0.386780227
## Customs.inspection -0.36542624 -0.29690348 -0.232520606
## 7groups 8groups 9groups
## Departure.timeEarly Morning -0.16244024 -0.125744174 -0.09375645
## Departure.timeEvening -0.05818589 -0.009314789 0.03587020
## Departure.timeMorning -0.02412821 -0.031547717 -0.03281461
## Departure.timeNight -0.05231162 -0.054987614 -0.05561965
## Walking.distance.inside.terminal 0.13085343 0.129162929 0.12873751
## Ease.of.making.connections 0.06741029 0.050407912 0.03609961
## Business.executive.lounges -0.02548698 -0.025401536 -0.02597293
## Comfort.of.waiting.gate.areas 0.12279280 0.115937415 0.11032029
## Cleanliness.of.airport.terminal 0.21151930 0.196167719 0.17870967
## Ambience.of.airport 0.28559429 0.289877517 0.29512945
## Arrivals.passport.and.visa.inspection -0.90143742 -0.896458853 -0.89119444
## Speed.of.baggage.delivery 0.31757958 0.252697621 0.19211221
## Customs.inspection -0.17251488 -0.117063170 -0.06629817
## 10groups
## Departure.timeEarly Morning -0.06563006
## Departure.timeEvening 0.07795468
## Departure.timeMorning -0.02887488
## Departure.timeNight -0.05459062
## Walking.distance.inside.terminal 0.12938759
## Ease.of.making.connections 0.02402655
## Business.executive.lounges -0.02700927
## Comfort.of.waiting.gate.areas 0.10618082
## Cleanliness.of.airport.terminal 0.15937904
## Ambience.of.airport 0.30141892
## Arrivals.passport.and.visa.inspection -0.88574785
## Speed.of.baggage.delivery 0.13573491
## Customs.inspection -0.02032167
rel.imp$lmg *100
## Departure.time
## 0.52825848
## Walking.distance.inside.terminal
## 0.36392182
## Ease.of.making.connections
## 0.66377888
## Business.executive.lounges
## 0.09527126
## Comfort.of.waiting.gate.areas
## 0.47996977
## Cleanliness.of.airport.terminal
## 0.73470857
## Ambience.of.airport
## 1.14385209
## Arrivals.passport.and.visa.inspection
## 66.81417591
## Speed.of.baggage.delivery
## 17.33133758
## Customs.inspection
## 11.84472564
rel.imp$lmg.rank
## Departure.time
## 7
## Walking.distance.inside.terminal
## 9
## Ease.of.making.connections
## 6
## Business.executive.lounges
## 10
## Comfort.of.waiting.gate.areas
## 8
## Cleanliness.of.airport.terminal
## 5
## Ambience.of.airport
## 4
## Arrivals.passport.and.visa.inspection
## 1
## Speed.of.baggage.delivery
## 2
## Customs.inspection
## 3
Proportion of variance explained by model: 90.84%
As we can see, the % importance is given by the lmg method as :
Departure.time 0.528% Walking.distance.inside.terminal 0.363% Ease.of.making.connections 0.663% Business.executive.lounges 0.095% Comfort.of.waiting.gate.areas 0.479% Cleanliness.of.airport.terminal 0.734% Ambience.of.airport 1.143% Arrivals.passport.and.visa.inspection 66.814% Speed.of.baggage.delivery 17.331% Customs.inspection 11.844%
Since the relationship between the drivers and Overall Satisfaction is linear and the responses are highly associated, we considered a more complex model, Random forest.
Preparing data for Random Forest Model
for (i in 4:37)
{
survey_omit[,i] <- as.factor(survey_omit[,i])
survey_omit[,i] <- as.ordered(survey_omit[,i])
}
survey_omit$Quarter <- NULL
survey_omit$Date.recorded <- NULL
Model
set.seed(27705)
library(randomForest)
## randomForest 4.6-14
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
## The following object is masked from 'package:ggplot2':
##
## margin
rf <- randomForest(Overall.satisfaction ~. , data=survey_omit, importance = TRUE)
rf$importance
## 0 1 2
## Departure.time 0.0002013969 0 0.0003333333
## Ground.transportation.to.from.airport 0.0004307648 0 0.0000000000
## Parking.facilities 0.0006022056 0 0.0000000000
## Parking.facilities..value.for.money. 0.0004920788 0 0.0000000000
## Availability.of.baggage.carts 0.0004436029 0 0.0006666667
## Efficiency.of.check.in.staff 0.0018011572 0 0.0010000000
## Check.in.wait.time 0.0024575829 0 -0.0016666667
## Courtesy.of.of.check.in.staff 0.0021730526 0 0.0003333333
## Wait.time.at.passport.inspection 0.0030470929 0 -0.0028333333
## Courtesy.of.inspection.staff 0.0026914811 0 -0.0040000000
## Courtesy.of.security.staff 0.0013841036 0 -0.0005000000
## Thoroughness.of.security.inspection 0.0018070536 0 0.0040000000
## Wait.time.of.security.inspection 0.0025554609 0 0.0021666667
## Feeling.of.safety.and.security 0.0031344658 0 -0.0036666667
## Ease.of.finding.your.way.through.the.airport 0.0005518231 0 0.0035000000
## Flight.information.screens 0.0001552117 0 0.0063333333
## Walking.distance.inside.terminal 0.0009524641 0 0.0055000000
## Ease.of.making.connections 0.0021159312 0 0.0000000000
## Courtesy.of.airport.staff 0.0002008149 0 -0.0120000000
## Restaurants 0.0015366817 0 0.0000000000
## Restaurants..value.for.money. 0.0015769618 0 0.0000000000
## Availability.of.banks.ATM.money.changing 0.0002464875 0 -0.0030000000
## Shopping.facilities 0.0014366192 0 0.0003333333
## Shopping.facilities..value.for.money. 0.0022882329 0 0.0020000000
## Internet.access 0.0002469587 0 0.0010000000
## Business.executive.lounges 0.0004486848 0 0.0000000000
## Availability.of.washrooms 0.0020751401 0 -0.0025000000
## Cleanliness.of.washrooms 0.0030276999 0 0.0086666667
## Comfort.of.waiting.gate.areas 0.0036870193 0 0.0128333333
## Cleanliness.of.airport.terminal 0.0035411703 0 0.0131666667
## Ambience.of.airport 0.0041679819 0 0.0128333333
## Arrivals.passport.and.visa.inspection 0.2493411142 0 0.0090000000
## Speed.of.baggage.delivery 0.1081883903 0 -0.0001666667
## Customs.inspection 0.0141381323 0 0.0050000000
## 3 4
## Departure.time 0.0037562050 2.325007e-03
## Ground.transportation.to.from.airport 0.0014373995 2.107434e-03
## Parking.facilities 0.0010595075 1.671025e-04
## Parking.facilities..value.for.money. 0.0011751197 3.233654e-04
## Availability.of.baggage.carts 0.0053944849 1.992465e-03
## Efficiency.of.check.in.staff 0.0016133008 2.530866e-03
## Check.in.wait.time 0.0109787108 6.456282e-04
## Courtesy.of.of.check.in.staff 0.0030188454 2.809828e-03
## Wait.time.at.passport.inspection 0.0222749697 -1.450338e-03
## Courtesy.of.inspection.staff 0.0098526119 -3.479993e-04
## Courtesy.of.security.staff 0.0266248240 2.381683e-03
## Thoroughness.of.security.inspection 0.0415393889 5.639152e-03
## Wait.time.of.security.inspection 0.0235714856 -3.537745e-03
## Feeling.of.safety.and.security 0.0262847309 2.593797e-03
## Ease.of.finding.your.way.through.the.airport 0.0288575981 -3.185728e-05
## Flight.information.screens 0.0201140806 1.562988e-03
## Walking.distance.inside.terminal 0.0515735882 -2.951345e-03
## Ease.of.making.connections 0.0009568472 1.089356e-04
## Courtesy.of.airport.staff 0.0508051148 9.124264e-04
## Restaurants 0.0110669776 1.966803e-03
## Restaurants..value.for.money. 0.0063810231 8.454966e-04
## Availability.of.banks.ATM.money.changing 0.0009022420 2.775898e-03
## Shopping.facilities 0.0071184898 5.115951e-03
## Shopping.facilities..value.for.money. 0.0029911285 3.762203e-03
## Internet.access 0.0044698136 2.421927e-04
## Business.executive.lounges 0.0005990089 4.825242e-04
## Availability.of.washrooms 0.0680645727 1.967591e-02
## Cleanliness.of.washrooms 0.0551669120 1.281529e-02
## Comfort.of.waiting.gate.areas 0.0994293842 6.625082e-03
## Cleanliness.of.airport.terminal 0.1245772144 2.715951e-02
## Ambience.of.airport 0.1268695559 5.109896e-02
## Arrivals.passport.and.visa.inspection 0.1567563324 2.266520e-01
## Speed.of.baggage.delivery 0.0118883193 1.725672e-02
## Customs.inspection 0.0347743174 4.938032e-02
## 5
## Departure.time 0.0046008193
## Ground.transportation.to.from.airport 0.0028761589
## Parking.facilities 0.0013893261
## Parking.facilities..value.for.money. 0.0006685922
## Availability.of.baggage.carts 0.0006336127
## Efficiency.of.check.in.staff 0.0092048049
## Check.in.wait.time 0.0079923772
## Courtesy.of.of.check.in.staff 0.0108037042
## Wait.time.at.passport.inspection 0.0053886653
## Courtesy.of.inspection.staff 0.0079362803
## Courtesy.of.security.staff 0.0144159514
## Thoroughness.of.security.inspection 0.0181849071
## Wait.time.of.security.inspection 0.0231823567
## Feeling.of.safety.and.security 0.0183176576
## Ease.of.finding.your.way.through.the.airport 0.0070239251
## Flight.information.screens 0.0024934103
## Walking.distance.inside.terminal 0.0100574775
## Ease.of.making.connections 0.0011413432
## Courtesy.of.airport.staff 0.0042437332
## Restaurants 0.0092830426
## Restaurants..value.for.money. 0.0051038297
## Availability.of.banks.ATM.money.changing 0.0017795486
## Shopping.facilities 0.0056840678
## Shopping.facilities..value.for.money. 0.0057866611
## Internet.access 0.0035114625
## Business.executive.lounges 0.0010764334
## Availability.of.washrooms 0.0114800751
## Cleanliness.of.washrooms 0.0142066094
## Comfort.of.waiting.gate.areas 0.0321677740
## Cleanliness.of.airport.terminal 0.0475761802
## Ambience.of.airport 0.0649759003
## Arrivals.passport.and.visa.inspection 0.2752243065
## Speed.of.baggage.delivery 0.0203153727
## Customs.inspection 0.0892569571
## MeanDecreaseAccuracy
## Departure.time 0.0015555333
## Ground.transportation.to.from.airport 0.0012475202
## Parking.facilities 0.0007016814
## Parking.facilities..value.for.money. 0.0005140451
## Availability.of.baggage.carts 0.0009029363
## Efficiency.of.check.in.staff 0.0034614861
## Check.in.wait.time 0.0036024042
## Courtesy.of.of.check.in.staff 0.0041027118
## Wait.time.at.passport.inspection 0.0033867193
## Courtesy.of.inspection.staff 0.0034916113
## Courtesy.of.security.staff 0.0050736908
## Thoroughness.of.security.inspection 0.0071481220
## Wait.time.of.security.inspection 0.0065729533
## Feeling.of.safety.and.security 0.0069724707
## Ease.of.finding.your.way.through.the.airport 0.0027271365
## Flight.information.screens 0.0015144143
## Walking.distance.inside.terminal 0.0038719606
## Ease.of.making.connections 0.0015428921
## Courtesy.of.airport.staff 0.0027390682
## Restaurants 0.0035212939
## Restaurants..value.for.money. 0.0023493875
## Availability.of.banks.ATM.money.changing 0.0009874546
## Shopping.facilities 0.0030877108
## Shopping.facilities..value.for.money. 0.0032608416
## Internet.access 0.0010906567
## Business.executive.lounges 0.0005934971
## Availability.of.washrooms 0.0089789033
## Cleanliness.of.washrooms 0.0086156480
## Comfort.of.waiting.gate.areas 0.0131939613
## Cleanliness.of.airport.terminal 0.0205033254
## Ambience.of.airport 0.0284659665
## Arrivals.passport.and.visa.inspection 0.2473645655
## Speed.of.baggage.delivery 0.0717289045
## Customs.inspection 0.0361283974
## MeanDecreaseGini
## Departure.time 27.578207
## Ground.transportation.to.from.airport 17.426113
## Parking.facilities 11.319293
## Parking.facilities..value.for.money. 12.427110
## Availability.of.baggage.carts 11.794447
## Efficiency.of.check.in.staff 14.997081
## Check.in.wait.time 13.846835
## Courtesy.of.of.check.in.staff 14.879923
## Wait.time.at.passport.inspection 15.864762
## Courtesy.of.inspection.staff 14.349553
## Courtesy.of.security.staff 17.744886
## Thoroughness.of.security.inspection 20.598643
## Wait.time.of.security.inspection 19.553825
## Feeling.of.safety.and.security 19.208359
## Ease.of.finding.your.way.through.the.airport 14.176076
## Flight.information.screens 13.758136
## Walking.distance.inside.terminal 18.409828
## Ease.of.making.connections 9.635924
## Courtesy.of.airport.staff 20.459790
## Restaurants 20.733673
## Restaurants..value.for.money. 22.008529
## Availability.of.banks.ATM.money.changing 11.687861
## Shopping.facilities 17.636889
## Shopping.facilities..value.for.money. 18.337668
## Internet.access 21.788125
## Business.executive.lounges 7.123887
## Availability.of.washrooms 25.429946
## Cleanliness.of.washrooms 26.601844
## Comfort.of.waiting.gate.areas 34.788871
## Cleanliness.of.airport.terminal 46.902134
## Ambience.of.airport 56.302689
## Arrivals.passport.and.visa.inspection 552.744328
## Speed.of.baggage.delivery 169.216008
## Customs.inspection 102.332635
In the model, the Mean Decrease Accuracy and Mean Decrease Gini explain the overall coefficients and the relative importance. This can be better explained by using the measure_importance() function in the randomForestExplainer library. randomForestExplainer contains a set of tools to help explain the most important variables in a ranfom forest.
library(randomForestExplainer)
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
importance_rf <- measure_importance(rf)
min_depth_frame <- min_depth_distribution(rf)
save(min_depth_frame, file = "min_depth_frame.rda")
load("min_depth_frame.rda")
head(min_depth_frame, n = 10)
## tree variable minimal_depth
## 1 1 Ambience.of.airport 1
## 2 1 Arrivals.passport.and.visa.inspection 2
## 3 1 Availability.of.baggage.carts 5
## 4 1 Availability.of.banks.ATM.money.changing 4
## 5 1 Availability.of.washrooms 6
## 6 1 Business.executive.lounges 3
## 7 1 Check.in.wait.time 6
## 8 1 Cleanliness.of.airport.terminal 1
## 9 1 Cleanliness.of.washrooms 2
## 10 1 Comfort.of.waiting.gate.areas 3
plot_min_depth_distribution(min_depth_frame)
plot_min_depth_distribution(min_depth_frame, mean_sample = "relevant_trees", k = 15)
importance_rf
## variable mean_min_depth no_of_nodes
## 1 Ambience.of.airport 2.674000 4940
## 2 Arrivals.passport.and.visa.inspection 2.078000 7175
## 3 Availability.of.baggage.carts 4.958000 3480
## 4 Availability.of.banks.ATM.money.changing 4.940000 3425
## 5 Availability.of.washrooms 3.976000 4794
## 6 Business.executive.lounges 5.353864 2010
## 7 Check.in.wait.time 4.900000 3930
## 8 Cleanliness.of.airport.terminal 2.890000 3893
## 9 Cleanliness.of.washrooms 3.732000 5035
## 10 Comfort.of.waiting.gate.areas 3.226000 5280
## 11 Courtesy.of.airport.staff 4.462000 4611
## 12 Courtesy.of.inspection.staff 4.718000 4100
## 13 Courtesy.of.of.check.in.staff 4.568000 3690
## 14 Courtesy.of.security.staff 4.174000 3917
## 15 Customs.inspection 2.945864 2534
## 16 Departure.time 3.652000 6803
## 17 Ease.of.finding.your.way.through.the.airport 4.368000 3523
## 18 Ease.of.making.connections 4.913864 2142
## 19 Efficiency.of.check.in.staff 4.606000 3768
## 20 Feeling.of.safety.and.security 4.028000 3717
## 21 Flight.information.screens 5.096644 3754
## 22 Ground.transportation.to.from.airport 4.924000 5444
## 23 Internet.access 4.602000 6439
## 24 Parking.facilities 5.106000 3627
## 25 Parking.facilities..value.for.money. 4.898000 3851
## 26 Restaurants 4.648000 6161
## 27 Restaurants..value.for.money. 4.638000 6688
## 28 Shopping.facilities 4.554000 5208
## 29 Shopping.facilities..value.for.money. 4.464000 5477
## 30 Speed.of.baggage.delivery 2.224000 6069
## 31 Thoroughness.of.security.inspection 4.046644 3969
## 32 Wait.time.at.passport.inspection 4.494000 4451
## 33 Wait.time.of.security.inspection 3.952000 4297
## 34 Walking.distance.inside.terminal 4.290000 4093
## accuracy_decrease gini_decrease no_of_trees times_a_root p_value
## 1 0.0284659665 56.302689 500 53 3.182806e-12
## 2 0.2473645655 552.744328 500 74 9.626574e-311
## 3 0.0009029363 11.794447 500 0 1.000000e+00
## 4 0.0009874546 11.687861 500 0 1.000000e+00
## 5 0.0089789033 25.429946 500 23 1.204825e-06
## 6 0.0005934971 7.123887 494 0 1.000000e+00
## 7 0.0036024042 13.846835 500 2 1.000000e+00
## 8 0.0205033254 46.902134 500 50 1.000000e+00
## 9 0.0086156480 26.601844 500 26 7.120511e-17
## 10 0.0131939613 34.788871 500 35 1.683698e-32
## 11 0.0027390682 20.459790 500 14 2.366094e-02
## 12 0.0034916113 14.349553 500 3 1.000000e+00
## 13 0.0041027118 14.879923 500 7 1.000000e+00
## 14 0.0050736908 17.744886 500 19 1.000000e+00
## 15 0.0361283974 102.332635 494 50 1.000000e+00
## 16 0.0015555333 27.578207 500 0 4.537606e-236
## 17 0.0027271365 14.176076 500 4 1.000000e+00
## 18 0.0015428921 9.635924 494 2 1.000000e+00
## 19 0.0034614861 14.997081 500 9 1.000000e+00
## 20 0.0069724707 19.208359 500 15 1.000000e+00
## 21 0.0015144143 13.758136 499 3 1.000000e+00
## 22 0.0012475202 17.426113 500 0 8.271402e-46
## 23 0.0010906567 21.788125 500 0 6.060606e-172
## 24 0.0007016814 11.319293 500 0 1.000000e+00
## 25 0.0005140451 12.427110 500 1 1.000000e+00
## 26 0.0035212939 20.733673 500 1 3.210666e-129
## 27 0.0023493875 22.008529 500 0 7.831645e-215
## 28 0.0030877108 17.636889 500 2 2.260923e-27
## 29 0.0032608416 18.337668 500 0 9.336441e-49
## 30 0.0717289045 169.216008 500 63 2.565869e-116
## 31 0.0071481220 20.598643 499 15 1.000000e+00
## 32 0.0033867193 15.864762 500 4 6.679741e-01
## 33 0.0065729533 19.553825 500 18 9.973488e-01
## 34 0.0038719606 18.409828 500 7 1.000000e+00
This shows the most important variables and their p-values. We can get the top 10 variables using the important_variables() function.
(vars <- important_variables(importance_rf, k = 10, measures = c("mean_min_depth", "no_of_trees")))
## [1] "Arrivals.passport.and.visa.inspection"
## [2] "Speed.of.baggage.delivery"
## [3] "Ambience.of.airport"
## [4] "Cleanliness.of.airport.terminal"
## [5] "Comfort.of.waiting.gate.areas"
## [6] "Customs.inspection"
## [7] "Departure.time"
## [8] "Cleanliness.of.washrooms"
## [9] "Wait.time.of.security.inspection"
## [10] "Availability.of.washrooms"
We have the top 10 features which impact the Overall Satisfaction for the customers. This is based on the min depth of the trees and the occurence of the varibales in maximum number of trees. This is a very robust model to get the most important variables.
hist(survey.df$Arrivals.passport.and.visa.inspection, main="Arrivals Passport & Visa Inspection")
hist(survey.df$Speed.of.baggage.delivery)
hist(survey.df$Ambience.of.airport)
hist(survey.df$Cleanliness.of.airport.terminal)
hist(survey.df$Comfort.of.waiting.gate.areas)
hist(survey.df$Customs.inspection)
hist(survey.df$Cleanliness.of.washrooms)
hist(survey.df$Wait.time.of.security.inspection)
hist(survey.df$Availability.of.washrooms)